Empirical Risk Minimization
We use empirical data to estimate true distribution of the data.
Formula
$$ \begin{aligned} w^{*} &= \arg\min_{w} \mathbb{E}_{p_{\text{true}(\underline{x}, y)}}[\ell( h_{\underline{w}}(\underline{x}), y)] &\text{[True Risk]} \\ &\approx \arg\min_{w} \frac{1}{N} \sum_{i=1}^{N} \ell(p_i, y_i) &\text{[Empirical Risk]} \\ \end{aligned} $$Nature
$$ \begin{aligned} \mathbb{E}_{p_{\text{true}(\underline{x}, y)}}[\ell( h_{\underline{w}}(\underline{x}), y)] &> \frac{1}{N} \sum_{i=1}^{N} \ell(p_i, y_i) \end{aligned} $$"Generalization Error" > "Training Error"